Goto

Collaborating Authors

 relative ranking



A APPENDIX A.1 Data Preparation

Neural Information Processing Systems

Criteo are whether the user has clicked the item or not. Table 4: Statistics of the used datasets. A smaller relative ranking means a more important cross-feature. Figure 7: Training curve of baselines and the searched architecture of our PROFIT. We also present the training curve of the searched model and human-designed models in Figure 7.


Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms

Jinng, Yunpeng, Liu, Qunfeng

arXiv.org Machine Learning

Evaluating performance across optimization algorithms on many problems presents a complex challenge due to the diversity of numerical scales involved. Traditional data processing methods, such as hypothesis testing and Bayesian inference, often employ ranking-based methods to normalize performance values across these varying scales. However, a significant issue emerges with this ranking-based approach: the introduction of new algorithms can potentially disrupt the original rankings. This paper extensively explores the problem, making a compelling case to underscore the issue and conducting a thorough analysis of its root causes. These efforts pave the way for a comprehensive examination of potential solutions. Building on this research, this paper introduces a new mathematical model called "absolute ranking" and a sampling-based computational method. These contributions come with practical implementation recommendations, aimed at providing a more robust framework for addressing the challenge of numerical scale variation in the assessment of performance across multiple algorithms and problems.


Can We Use Large Language Models to Fill Relevance Judgment Holes?

Abbasiantaeb, Zahra, Meng, Chuan, Azzopardi, Leif, Aliannejadi, Mohammad

arXiv.org Artificial Intelligence

Incomplete relevance judgments limit the re-usability of test collections. When new systems are compared against previous systems used to build the pool of judged documents, they often do so at a disadvantage due to the ``holes'' in test collection (i.e., pockets of un-assessed documents returned by the new system). In this paper, we take initial steps towards extending existing test collections by employing Large Language Models (LLM) to fill the holes by leveraging and grounding the method using existing human judgments. We explore this problem in the context of Conversational Search using TREC iKAT, where information needs are highly dynamic and the responses (and, the results retrieved) are much more varied (leaving bigger holes). While previous work has shown that automatic judgments from LLMs result in highly correlated rankings, we find substantially lower correlates when human plus automatic judgments are used (regardless of LLM, one/two/few shot, or fine-tuned). We further find that, depending on the LLM employed, new runs will be highly favored (or penalized), and this effect is magnified proportionally to the size of the holes. Instead, one should generate the LLM annotations on the whole document pool to achieve more consistent rankings with human-generated labels. Future work is required to prompt engineering and fine-tuning LLMs to reflect and represent the human annotations, in order to ground and align the models, such that they are more fit for purpose.


Reasoning Over Paths via Knowledge Base Completion

Sudhahar, Saatviga, Roberts, Ian, Pierleoni, Andrea

arXiv.org Artificial Intelligence

This is crucial for the use of large Knowledge bases in many downstream applications. However explaining the predictions given by a KBC algorithm is quite important for several real world use cases. For example in rec-ommender systems, a knowledge graph of users, items and their interactions are used to recommend an item to a user based on the users interactions on several items. The ability to explain and reason on the decision is of critical importance to add knowledge to recommender systems. Similarly in a knowledge graph consisting human biological data such as genes, drugs, symptoms and diseases, it is crucial to know which gene and symptoms were involved in predicting a drug for a disease. This requires automatic extraction and ranking of multi-hop paths between a given source and a target entity from a knowledge graph. Previous work has focused on using path information in knowledge graphs for KBC known as path-based inference (Lao et al., 2011; Gardner et al., 2014; Neelakantan et al., 2015; Das et al., 2017b), in which a model is trained to predict missing links between a given pair of entities taking as input several paths that existed between them. Paths are ranked according to a scoring method and used as features to train the model. Embedding-based inference models (Bordes et al., 2013; Lin et al., 2015; Nickel et al., 2011; Socher et al., 2013; Trouillon et al., 2016) for KBC learn entity and relation embeddings by solving an optimization problem that maximises the plausibility of known facts in the knowledge graph.


Riffled Independence for Efficient Inference with Partial Rankings

Huang, J., Kapoor, A., Guestrin, C.

Journal of Artificial Intelligence Research

Distributions over rankings are used to model data in a multitude of real world settings such as preference analysis and political elections. Modeling such distributions presents several computational challenges, however, due to the factorial size of the set of rankings over an item set. Some of these challenges are quite familiar to the artificial intelligence community, such as how to compactly represent a distribution over a combinatorially large space, and how to efficiently perform probabilistic inference with these representations. With respect to ranking, however, there is the additional challenge of what we refer to as human task complexity -- users are rarely willing to provide a full ranking over a long list of candidates, instead often preferring to provide partial ranking information. Simultaneously addressing all of these challenges -- i.e., designing a compactly representable model which is amenable to efficient inference and can be learned using partial ranking data -- is a difficult task, but is necessary if we would like to scale to problems with nontrivial size. In this paper, we show that the recently proposed riffled independence assumptions cleanly and efficiently address each of the above challenges. In particular, we establish a tight mathematical connection between the concepts of riffled independence and of partial rankings. This correspondence not only allows us to then develop efficient and exact algorithms for performing inference tasks using riffled independence based representations with partial rankings, but somewhat surprisingly, also shows that efficient inference is not possible for riffle independent models (in a certain sense) with observations which do not take the form of partial rankings. Finally, using our inference algorithm, we introduce the first method for learning riffled independence based models from partially ranked data.